Climate change is a hot topic throughout the country and the world. It can be very divisive based on political leanings and various other socioeconomic factors. In this report, we will investigate sentiment on climate change in different regions of the United States by searching through news articles from several publications in different regions of the country.
Below is a graph for the sentiment range for each publication with lower values representing more negative words and higher values representing more positive words.
Below is a table containing the values for term frequency (tf), inverse document frequency (idf) and a combination of the two (tf_idf) for each word in each of the articles for each publication.
Below is a graph for the sentiment range for each publication with lower values representing more negative words and higher values representing more positive words.
Below is a table containing the values for term frequency (tf), inverse document frequency (idf) and a combination of the two (tf_idf) for each word in each of the articles for each publication.
For all 100 articles used for the Northwest and Midwest regions, I combined each article from the five publications from each region into a corpus. To do this, I created a function that would read a pdf file between the words ‘Body’ and ‘Classification’ as this is the format each article was downloaded in from the Nexas Uni website. I used lapply to apply this function to each file in the directory for each publication, creating a table for each publication where each row is the text of every article. I then ran sentiment analysis on these and displayed the AFINN word positivity values for each news publication. To calculate the term frequency and inverse document frequency, I used the table I described above with the text for each publication, calculated the word count for every word in each article, calculated the total words in each article and added the term frequency and inverse document frequency by using the bind_tf_idf function. According to the analysis done here, I have found that the sentiment for most of the publications was relatively neutral, with words being classified mostly evenly between positive and negative values. Additionally, some of the most commonly occurring words in each article and ‘climate’ and ‘change’. Both of these things are likely due to the method of finding the articles and could be made less of an issue with a more sophisticated article search procedure. As next steps, I would recommend a more in-detail search for articles, being sure to control for frequently-occurring words and cover as wide a sentiment range as possible. This could lead to being able to conduct a more meaningful analysis of climate change sentiment throughout the country.
For each article, I displayed a wordcloud of the different unique terms in it, and it scales the size of the word based on the word’s frequency. Below that is a wordcloud color coded for sentiment analysis (green indicates positive sentiment and red indicates negative sentiment).
DataTable of Term Frequency (tf) in South Newspapers and their respective Inverse Document Frequencies (idf)
DataTable of Term Frequency (tf) in West Newspapers and their respective Inverse Document Frequencies (idf)
I combined the 10 articles from the South and West regions into two respective corpora via lists and the “unite” function. I looked at the sentiment analysis for each newspaper, and also compared it to the corpus on the whole. I also used the bind_tf_idf function to create a data table that displayed the tf and idf for terms in the documents. Some articles skewed positive whilst others skewed negative across both regions. All together, the South and the West regions showed similar sentiment levels. However, this should be taken with a grain of salt because the “South” region only encompassed Florida and the “West” region only encompassed California. Both of these states deeply feel the effects of climate change through rising sea levels, intensified storms, and extreme summer temperatures. Therefore, despite their geographical differences, they may still be closely aligned when discussing climate change. It is hard to conclude much from the term frequency analysis because the single words do not mean much without context. “Climate” and “Change”, the search terms used to procure the articles, occurred often in the documents. Furthermore, one of the terms with the highest tf_idf for the West region is “coffee”, but that does not mean coffee necessarily connects to climate change. The most differentiating textual element is “Hurricane” appears much more often in the South region than in the West because the West does not have to worry about those storms. As for next steps, it would be beneficial to either
A: Capture more regions
or
B: Include more states in each region
This will mitigate the effects of California and Florida’s similarities.
Also, I believe collecting more articles would strengthen the validity of any insights from this data because 10 articles likely does not capture the general sentiment towards climate change.
Overall, the regions showed little difference beyond region-specific takeaways (ex: “Irvine” occurred frequently in West publications). This can likely be attributed to the fact that no matter one’s stance on climate change events are occurring that are interpreted both positively and negatively. If a region supports green energy, they will laud the development of renewable energy and begrudge when fossil fuels are defended. Someone from the other perspective will view those stories in the opposite way, but because bot occur at what seems to be similar rates, the sentiment analysis for any given region will become balanced. In the future, as discussed earlier, we may want to refine our search term to a more targeted search like “wind turbine development” to see if regions perceive the growth of wind energy in different ways. As long as there are ample articles for this type of search, we believe it could yield more insightful results from the text